IPMicra: An IP-address based Location Aware Distributed Web Crawler
نویسندگان
چکیده
Distributed crawling is able to overcome important limitations of the traditional single-sourced web crawling systems. However, the optimal benefit of distributed crawling is usually limited to the sites hosting the crawlers, the rest of the URLs are by large randomly distributed to the various crawlers. In this work, we propose a location-aware method, called IPMicra, that utilizes an IP address hierarchy, and allows crawling of links in a near optimal location aware manner. Our proposal outperforms earlier distributed crawling schemes by requiring one order of magnitude less time for crawling of the same set of sites.
منابع مشابه
Ipmicra: Toward a Distributed and Adaptable Location Aware Web Crawler
Distributed crawling has shown that it can overcome important limitations of the centralized crawling paradigm. However, the distributed nature of current distributed crawlers is currently not fully utilized. The optimal benefits of this approach are usually limited to the sites hosting the crawler. In this work we propose IPMicra, a distributed location aware web crawler that utilizes an IP ad...
متن کاملMinimizing the Network Distance in Distributed Web Crawling
Distributed crawling has shown that it can overcome important limitations of the centralized crawling paradigm. However, the distributed nature of current distributed crawlers is currently not fully utilized. The optimal benefits of this approach are usually limited to the sites hosting the crawler. In this work we describe IPMicra, a distributed location aware web crawler that utilizes an IP a...
متن کاملOn Location Aware Internet
An important aspect of performance for Internet-based applications is network delay (measured in terms of bandwidth and latency). The contribution and the basic motivation of Location Aware Internet is to enable clients to easily find the nearest (in terms of latency) out of a number of servers that can service a specific request. Location Aware Internet can significantly improve the performanc...
متن کاملAutomating Geography-Based Redirection
Reliability and performance considerations have led to widespread use of mirror sites for content delivery. Based on measurements, redirection to a geographically closer replica is indeed valuable. Given an IP address, a mapping tool has to return the geographic location of the host to which the IP address has been assigned. This is a difficult problem because an IP address does not inherently ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004